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I.  Introduction 

Since  Kolmogorov  and  Smirnov  established  their  limiting  distribution 
theorems  concerning  maximal  deviations  between  empirical  and  theoretical 
distributions,  an  increasing  amount  of  scientific  work  has  been  done  by 
statisticians  in  this  field.  Practical  importance  and  theoretical  interest  give  the 
motivation.  Researchers  have  worked  on  distribution  laws,  power  considera¬ 
tions  and  the  limiting  process  in  the  last  five  years  with  considerable  results. 
The  present  paper  will  consider  only  a  few  results,  those  nearest  to  the  author’s 
work  and  interest  of  the  past  few  years.  The  first  part,  Section  2,  concerns  the 
case  when  the  parent  distribution  is  noncontinuous,  the  third  and  fourth 
sections  consider  the  one  and  two  sample  problem,  in  the  fifth  section  an 
analogous  question  for  a  density  function  due  to  Revesz  is  discussed,  while  in 
the  last  section  the  two  dimensional  problem  is  considered. 


2.  The  Gnedenko- Korolyuk  distribution  for  discontinuous  random  variables 

In  his  paper  Schmid  [12]  has  given  the  limiting  distribution  law  of  the 
Kolmogorov  and  of  the  Smirnov  statistics,  that  is,  of 

D„  =  sup  \Fn(x)  -  F(x) |, 

(x) 

D+  =  sup  [^(x)  -  F{x)~\ 

(x) 

for  discontinuous  F(x),  where  F„(x)  denotes  the  empirical  distribution  function 
of  a  sample  of  size  n  from  a  population  distributed  according  to  F{x).  Using  the 
ballot  lemma  Csaki  [2]  determined  the  exact  distribution  of  D+  for  finite  n 
which  corresponds  to  the  well  known  Smirnov-Birnbaum-Tingey  distribution  for 
continuous  F(x).  His  formula  has  a  fairly  complicated  form. 
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The  two  sample  case  for  discontinuous  G(x)  =  F(x)  was  considered  by  the 
author  [20],  who  determined  for  finite  m  =  n  the  exact  distribution  of 

Dn,n  =  max  [F„(a:)  -  Gn{x)], 

(x) 

(2'2)  D„  n  =  max  \Fn(x)  -  Gn(x) |, 

as  well  as  their  limiting  forms  as  n  ->  oo. 

Let  F(x)  have  jumps  at  xt,  •  ■  •  ,  xr,  with  xt  <  xi+l,  and  be  continuous 
otherwise.  Let  F(x)  be  left  continuous  satisfying  the  following  relations:  with 
x0  =  —oo,xr+l  =  +oo, 

F(*i)  ~  Fi*i- i  +  0)  =  i  =  1,  •  •  •  ,  r  +  1 

F(Xi  +  0)  -  F^,)  =  i  =  1,  •  •  •  ,  r, 

where  J  pf  +  £i=1  q{  =  1. 

For  the  limiting  distributions  we  have  for  y  ^  0, 

. 1/2 

(2.4)  lim  P 
“  (2ti) 

where 

(2.5) 


D„,„  <  y 


7 — (•■■  f  n[i-«p{- -T, 
Pr+I  J  c,  J  “i  L  l  Pi 


“Pl-I,?,  (M|2  +  W? )_2pr+I 


-1  -  r<-i)(y  -  «,  -  r(-i) 

i 


(«r  +  j-,)*  n  duidv)i , 


i=  1 


£0  =  0,  ^i  =  X  MjPj1/2>  i  =  1,  •  •  ■  ,  r  +  1, 

;=i 


P0  =  0,  Pi  =  X  W/2, 

j=  i 


i  =  1,  •  •  • ,  r 


and  for  the  domain  of  integration  we  have 

(2.6)  G+  =  {£,•_!  +  Pi_i  <  y,  St  +  Pi_j  <y,i  =  1,  ■  •  • ,  r}. 
In  the  two  sided  case  the  following  form  holds,  for  y  >  0, 

1/2 

(2.7)  limP . 

fl“>  00 

~~  (271) 


/ 


A,,„  <  V 

TZ -  I  —  f.  FI  r  Z  [exP  j-  -  {tyy  ~  UiPill2)2yy 

Pr+  lj  G  Ji=l\y=— oo|_  (.  Pi 

—  exp  j—  ^  [(2y  +  1)  y  +  St  +  P,-i] [(2y  +  1)  y  +  1  +  P|-i]| 

exp  j-  l  X  ( uf  +  Wf)  -  1T~  ( sr  +  Pr)2j  fl  dutdWi 

I  *  i=l  *Pr+l  )  i=  1 
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with  the  same  St  and  Tt  as  above,  while  for  the  domain  of  integration  we  have 
(2.8)  G  =  {-  y  <  Si_l  +  T7;-!  <yx  -  y  <  St  +  Ti.1  <  y,i  =  1,  •  •  • ,  r}. 


As  in  the  one  sample  case  the  distributions  are  no  longer  independent  of  F(x) ; 
they  depend  on  the  values  F(x{)  and  F(x{  +  0)  at  the  points  of  discontinuity, 
but  only  on  them. 

The  case  of  P,  =  0  for  each  i,  that  is,  the  case  of  a  discrete  distribution  was 
considered  by  S.  Sujan  [16].  For  finite  n  the  corresponding  distributions  can  be 
obtained  from  the  formulas  given  in  the  paper  cited,  but  the  above  relations  do 
not  work  for  pt  =  0.  According  to  the  calculation  of  Sujan  for  purely  discrete 
random  variables,  with  our  above  notations  the  following  hold 


(2.9) 


lim  P 

n-*  00 


<  y 


l 


(2n)r  1qr 


1/2 


1  r  2  ]  r-1 

exP  1  -  0  I  wf  -  7TT  Tr-  1  \  n  dwi  ’ 

zi=l  z9r  )  i  =  1 


2  9r 


where  G^~  i  =  {Tt  <  y,  i  =  1,  •  •  •  ,  r  —  1}. 

The  corresponding  limit  relation  for  Dn  n  has  the  same  form  with  the  single 
exception  that  the  domain  of  integration  has  the  two  sided  form 


(2.10)  Gr-i  =  {-  y  <  Ti  <  y,  i  =  1,  *  *  • ,  r  -  1}. 


It  was  pointed  out  by  Kolmogorov  and  proved  by  Noether  [9]  that  for  a 
critical  value  c  the  relation 


(2.11)  P{Dn  >  C\P  is  discrete)  <S  P(D„  >  c|P  is  continuous) 

holds.  This  means  that  for  given  size  a  of  a  test  based  on  the  Kolmogorov 
statistic 

(2.12)  Dn  =  sup  |J„(*)  -  F(x)  | 

(*) 

the  critical  region  must  be  at  least  as  large  as  in  the  continuous  case.  Table  I 
gives  numerical  calculations  carried  out  by  Sujan  showing  how  pessimistic  the 
Kolmogorov-Smirnov  two  sample  test  may  be  in  the  discrete  case. 


TABLE  I 

Numerical  Example  of  Pessimism  of 
Kolmogorov-Smirnov  Two  Sample  Test  for  the  Case 
n  =  3,  r  =  2,  c  =  0,  q2  =  1  —  ?i 


9i 

P{D3, 3  >  0) 

1/4 

0.308 

1/3 

0.332 

1/2 

0.363 

F  continuous 

0.75 
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The  last  row  corresponds  to  the  continuous  case.  The  example  chosen  is  a 
very  extreme  case.  It  can  be  shown  that  the  less  pessimistic  case  is  px  =  p2  = 

3.  On  distributions  of  statistics  connected  with  the  two  sample  problem 

In  the  last  two  decades  a  lot  of  effort  has  been  expended  to  determine  the 
exact  probabilities  of  the  Kolmogorov-Smirnov  two  sample  statistic,  which 
was  known  for  special  sample  sizes  only  (m  =  kn.  k  positive  integer).  In  this 
respect,  the  paper  of  Steck  [15]  can  be  considered  as  containing  far  reaching 
results.  Using  ideas  of  Maag  and  Stephens,  [7]  and  also  of  Lehmann,  the 
Smirnov  statistics  were  expressed  in  terms  of  the  ranks  of  one  sample.  Steck 
then  expressed  explicitly  the  distribution  in  the  form  of  a  determinant,  when 
one  underlying  distribution  is  the  power  of  the  other,  G{x)  =  [F(:r)]fc.  Further, 
by  giving  determinant  formulas  for  the  frequency  content  under  the  null 
hypothesis  of  any  parallelepiped  in  the  sample  space  of  the  ranks  of  one  sample, 
he  obtained  the  null  joint  distribution  of  the  one  sided  statistic  and  thus  the 
null  distribution  of  the  twro  sided  statistic,  for  arbitrary  sample  sizes. 

Since  the  application  of  a  pair  of  statistics  as  test  statistic  w'as  introduced  by 
Vincze  [19],  some  authors  have  determined  joint  distribution  laws  considering 
the  question  from  this  point  of  view.  V.  Sujan  [17]  obtained  the  joint  distribution 
of  the  maximum  deviation  and  the  number  of  runs;  she,  among  others,  proved 
that  these  two  statistics  are  asymptotically  independent.  Mohanty  and  Pestros 
[8]  considered  joint  distributions  belonging  to  different  rank  statistics.  In  my 
paper  the  test  wras  examined  when  the  alternative  is  specified,  in  wrhich  case  the 
likelihood  ratios  are  completely  ordered.  The  test  for  one  sided  alternatives  was 
constructed  by  the  authors  mentioned  using  a  partial  ordering  in  the  range  space 
of  the  joint  statistic  based  on  certain  relations  of  likelihood  ratios. 

The  considerations  mentioned  led  to  a  development  in  the  theory  of  simple 
random  walks,  and  an  interesting  new  method  was  constructed  by  Dwass  [4] 
in  the  technique  of  generating  functions. 

For  the  former  point  we  refer  to  the  paper  of  Sen  [13]  in  which,  among  other 
results,  a  systematic  treatment  is  given  of  certain  useful  path  transformations. 

In  [4]  the  random  walk  {Sx,  $2,  *  '  '}  is  considered  with  P($,-  =  4-1)  =  p, 
P(S{  =  —  I)  =  1  —  p  =  q,  the  being  independent.  Assuming  that  p  >  q 
the  random  walk  returns  to  the  origin  at  most  finitely  often  with  probability  one. 
Denote  by  T  the  last  index  for  which  the  partial  sum  «,•  =  $i  4-  $2  +  ‘  *  '  +  $,• 
vanishes  (that  is,  sT  =  0,  T  even).  Let  U  be  a  function  defined  on  the  random 
walk  which  is  completely  determined  by  the  first  T  of  the  «9j.  Let  us  define 

(3.1)  un(&lf  d2,  •  •  •  .  S2n)  =  U(9X,  #2,  •  •  •  .  Sr),  when  T  =  2n. 

But  the  U n  can  be  considered  as  those  rank  statistics  which  play  a  role  in  the  two 
sample  problems.  The  relation 

o°  °°  /2w\ 

(3.2)  E(U)  =  £  E(Un)P(T  =  2«)  =  (1  -  2p)  £  )<pq)‘E(U.) 

n  =  0  n  =  0  \  H  / 
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given  by  Dwass  [4]  makes  it  possible  for  him  to  derive  previously  known  and 
also  new  distributions  in  a  very  simple  way. 

An  example  in  [19]  shows  that  when  we  use  a  pair  of  statistics  instead  of  one 
statistic  in  the  case  of  a  given  simple  alternative,  the  probability  of  the  error  of 
second  kind  is  reduced  to  one  half  or  one  quarter  of  its  previous  value.  In  any 
case,  depending  on  the  alternative,  the  second  kind  error  can  be  diminished  from 
extremely  high  values  to  acceptable  ones  from  the  practical  point  of  view.  The 
examples  mentioned  concern  the  pair  (D* R^„),  where  D+ „  is  the  one  sided 
maximal  deviation  between  the  two  empirical  distribution  functions,  while 
R+ n  is  the  index  of  the  sample  element  in  the  ordered  union  of  two  samples  for 
which  the  maximum  D*  „  first  occurs.  The  test  based  on  (D  +  n,  R^„)  is  compared 
with  the  test  based  on  D  +  n  only  for  the  equal  sample  sizes,  n  =  10,  30,  50.  We 
thought  that  the  power  of  the  Kolmogorov-Smirnov  test  would  be  improved  in 
this  way  for  arbitrary  sample  sizes,  that  is,  for  the  tests  based  on  {D  +  m,  R^m) 
and  D+m.  respectively.  Surprisingly,  Steck  in  his  paper  [15]  showed  that  the 
situation  is  not  so  simple;  it  depends  on  the  relationship  between  m  and  n.  If 
they  are  relatively  prime  then  there  is  precisely  one  value  of  R^  m  which  is  asso¬ 
ciated  with  a  given  value  of  D  +  m,  hence  any  test  based  on  (D  +  m,  R^m)  is 
equivalent  to  the  one  based  on  Z)„+ m.  The  interesting  and  surprising  consequence 
is  the  following:  in  a  given  case  with  sup(x)  [i^fa:)  —  Cr(a:)]  =  0.2  using 
(^so  50 .  Rso  so )  the  second  kind  error  turns  out  to  be  0.047,  while  by  using 
only  D 50  50  the  corresponding  value  is  0.173,  nearly  four  times  as  much.  When 
we  turn  to  samples  n  =  50  and  m  =  51  my  method  does  not  lead  to  any 
improvement  of  the  two  sample  Smirnov  test.  What  is  the  probability  of  the 
second  kind  error,  how  does  it  relate  to  the  above  two  values,  what  is  the 
asymptotic  relation  of  the  corresponding  power  functions?  These  are  questions 
of  interest. 

I  should  like  to  mention  that  in  our  paper  with  Reimann  [10],  statistics  of  the 
following  type  were  considered 

(3.3)  nF„(x)  -  mGm(x), 

the  distribution  of  which  can  be  determined  easily,  and  does  not  show  the  same 
irregularity  for  m  and  n  relatively  prime  as  shown  by  Steck. 

4.  Some  questions  connected  with  the  one  sample  problem 

While  an  exact  formula  for  the  distribution  of  the  Smirnov  statistic 

(4.1)  D  +  =  sup  \_F„(x)  -  F{x)~\ 

(X) 

was  known  very  early  (Smirnov  1944,  independently  Birnbaum  and  Tingey 
1951),  the  distribution  of  the  two  sided  statistic 
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was  obtained  only  within  the  last  five  years.  Durbin  [3]  derived  the  generating 
function  for  the  probabilities  that  the  empirical  distribution  function  lies 
between  two  parallel  straight  lines.  He  obtained  recursion  formulas  and  for  a 
particular  case,  that  is,  for  a  certain  integral  value  of  a  parameter,  he  has 
exact  formulas.  Epanechnikov  [5]  by  means  of  a  recurrence  relation  determined 
the  exact  probabilities.  Independently  Steck  in  his  paper  [14]  gives,  as  he  says, 
“a  neat  determinant  for  the  probability  that  the  order  statistics  for  a  sample 
of  uniform  random  variables  all  lie  in  a  multidimensional  rectangle.  An 
immediate  application  of  this  result  gives  the  probability  that  the  empirical 
distribution  function  lies  between  two  other  distribution  functions.”  These 
authors  have  obtained  a  result  for  which  a  great  deal  of  effort  was  expended  in 
the  last  two  decades. 

Turning  to  the  one  sided  case,  the  use  of  the  ballot  lemma  (see  Takacs  [18]) 
enables  us  to  get  very  simple  derivations  of  the  distribution  of  Df  and  of  distri¬ 
butions  of  a  number  of  related  statistics. 

The  following  modified  and  extended  form  of  the  ballot  lemma  suggested  by 
me  leads  almost  immediately  to  the  Smirnov-Birnbaum-Tingey  theorem. 

Let  A0,  Ax,  A2,  ■  •  ■  ,  An  be  a  complete  system  of  events  such  that  P{A0)  =  p, 
P(Ai)  =  q,i  =  1,  •  •  •  ,  n,p  4-  nq  =  1.  Denoting  by  vt  the  frequency  of  the  event At 
in  n  trials ,  the  following  relation  holds 

(4.3)  P  ^  £  vj  <  i.  i  =  1,  •  •  •  ,  n'j  =  p. 

It  is  easy  to  see  that  an  elementary  proof  can  be  obtained  for  this  theorem 
from  the  following  nice  lemma  due  to  and  proved  by  Tusnady  [2]. 

Let  the  points  Px,  P2,  •  •  •  .  Pn  be  given  on  a  directed  circle  of  unit  circumference 
and  let  us  choose  the  positive  number  q  such  that  0  <  nq  <  1 .  T o  an  arbitrary  point 
Q  of  the  circle ,  construct  the  points  Qx.  Q2.  ■  •  •  ,  Q„  consecutively  in  the  positive 
direction  with  QQt  —  iq.  Let  the  point  Q  be  called  a  point  of  first  category  if  the 
arc  QQk  contains  less  than  k  of  points  Px,  P2,  •  •  ,  Pn  for  k  =  1,  •  •  •  ,  n.  Then 
the  measure  of  the  set  of  points  of  first  category  is  1  —  nq. 

Proof  (Tusnady).  To  each  point  P{  a  chain 

(4-4)  =  {Riy  i ,  Hi, 2,  ■  ’  ’ ,  Ri, v(i) } 

will  be  ordered  in  the  following  way:  these  points  are  consecutive  but  in  the 
negative  direction  on  the  circle;  further  the  arc  ;  contains  at  leasts  of  the 
points  Px ,  P2,  •  •  •  ,  P„ ,  including  the  point  P,  for  j  =  1,  •  •  •  ,  v(i),  while  less  than 
j  f°r  j  =  v(i)  +  1. 

It  can  easily  be  seen  that  if  a  chain  covers  the  point  P}  then  it  covers  Cj  as  well. 

A  chain  is  called  maximal  if  no  other  chain  covers  it.  Two  maximal  chains  are 
disjoint,  hence  the  total  length  of  all  maximal  chains  is  nq. 

A  point  Q  on  the  circle  is  now  of  the  first  category  if  and  only  if  no  chain  covers 
it.  Consequently  the  measure  of  all  points  of  first  category  is  1  —  nq,  as  stated 
above. 
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Unfortunately  a  “two  sided  ballot  lemma,”  which  would  be  a  tool  for  the 
derivation  of  the  distribution  of  Dn,  for  example,  does  not  yet  exist. 

Added  in  proof.  See  S.  G.  Mohanty,  “Combinatorial  methods  in  probability 
and  statistics,”  lecture  presented  at  the  58th  Session  of  the  Indian  Science 
Congress  Association,  1971. 

A  number  of  interesting  articles  have  appeared  in  recent  years  which  concern 
the  power  or  asymptotic  power  of  Kolmogorov-Smirnov  tests.  It  is  beyond 
our  scope  to  mention  these,  but  as  a  nice  summarization  of  certain  results  in 
this  field,  I  would  mention  the  book  of  Hajek  and  Sidak  [6]. 


5.  Distribution  of  a  Renyi  type  statistic  due  to  Revesz 

Let  the  null  hypothesis  be  that  the  density  function  is  fully  specified  by  f(x), 
a  function  satisfying  certain  conditions  (for  example,  f'{x)  exists  and  |/'(^)|  is 
bounded). 

Let 


(5.1)  Xq  <  xt  <  X2  <  •  '  ■  <  XUn 

be  a  division  of  the  real  axis  or  of  the  interval  where  f(x)  >  0.  The  an,  n  = 
1,2,-’,  are  restricted  by  the  relations 

(5.2)  ra1/3  log  n  <  a„  <  n1~E,  e  >  0. 

Denoting  by  ki+l  the  number  of  elements  out  of  the  sample  (X  t ,  X2 ,  •  *  *  ,  A„)  in 
the  interval  {xt,  xi+  x ),  i  =  1,  •••,«„  -  1,  the  empirical  density  function  is 
defined  by 


(5.3) 


/«(*)  = 


^i+l 


Xj  £  x  <  xi+1,  i  =  0,  1,  •  •  • ,  an  —  1. 


n(xi+l  -Xi) 

Revesz  [11]  proved  the  following  limiting  relations  which  are  distribution  free 


(5.4)  lim  P 


(5.5)  lim  P 


1/2  sup  f"{X)  ~}{X) 

Xa<X<Xp 


1/2 


sup 

L\Wn/  xa<x<xp 


a. 


f(x) 

fn(x)  -f{3C) 


fix) 


<  (2  log  an  -  log  log  an  +  y)1'2 J 
=  exp  {-  exp  {-  y/2}/2n112}, 

<  (2  log  a„  -  log  log  an  +  i/)1/2J 
=  exp  {-  exp  {-  y/2}/n1/2}. 


The  values  xa  =  x(a,  n)  and  xp  =  x{fi,  n)  form  an  interval  for  which  f{x)  ^ 
(log  n) ~ 1/3 . 

Further 


(5.6) 


J  P  f(x)  dx  -+  1 , 


as  n  — *  oo. 
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6.  On  a  two  dimensional  analogue  of  the  Gnedenko-Korolyuk  distribution 


The  difficulty  encountered  in  constructing  distribution  free  methods  for 
samples  taken  on  two  or  more  variate  random  variables  is  well  known.  Recently 
Bickel  [1]  has  given  a  distribution  free  version  of  the  Smirnov  two  sample 
statistic  in  the  p  variate  case.  We  shall  consider  the  original  Smirnov  statistics 
in  two  dimensions 


At*  =  sup  [. Fn(x ,  y)  -  G„(x,  y)], 

(X,  y) 

(6.1) 

D„,„  =  sup  | Fn(x,  y)  -  Gn(pc,  y)\ 

0 x,y ) 


in  case  of  equal  sample  sizes.  Our  consideration  leads  to  an  immediate  extension 
of  the  random  walk  model.  As  is  known  and  will  be  illustrated  below,  the 
distribution  of  D+ „  or  D„  „  does  depend  on  the  common  theoretical  continuous 
distribution  function  G(x,  y)  =  F(x,  y)  of  the  two  samples.  Our  aim  is  to  propose 
a  problem  which  will  concern  the  independent  case  F(x,  y)  =  H  1(x)H2(y)  and 
which  shows  the  difficulties  even  in  this  simple — distribution  free — case. 

Let  (Xf,  Yt)  and  (X-,  Y-),  i  =  1,  be  two  samples  with 


P(Xi  <  x,  Yi  <  y)  =  P(X;  <  x,  Yl  <  y)  =  F{x,  y). 


Since  F(x,  y)  is  continuous  there  is  a  “two  dimensional  ordering”  of  the  two 
samples  with  probability  one  in  the  following  way :  let  rj\  <  rj*2  <  •  •  •  <  rj*2n 
be  the  ordered  union  of  the  samples  (Fj,  Y2,  ■  •  •  ,  Y„)  and  ( Y[ ,  Y2,  •  •  •  ,  Y^) 
and  let  us  denote  by  the  corresponding  X  or  X'  of  rj*,  that  is,  we  have 

(6.3)  (£i,  */i),  (£2,  Vih  •  •  •  >  (£2„>  nm)- 


Taking  now  the  ordered  version  of  the  £,• 

(6-4)  «<«<*••<  4,, 


the  following  random  variables  will  be  introduced 

+  1  if  rj\  =  Yh  and  =  X, 

(6.5)  Sitj  =  -  1  if  rj]  =  Y'e  and  6  =  Vj  =  X'h 

0  otherwise. 

Now  we  have  an  arrangement  of  + 1  and  of  —1,  each  n  in  number,  in  a 
2 n  x  2 n  table  containing  in  each  row  and  in  each  column  exactly  one  element. 
This  corresponds  to  and  is  analogous  with  the  random  walk  used  first  by 
Gnedenko  and  Korolyuk  and  independently  by  Drion.  Let  us  introduce  the 
“partial”  sums  in  the  following  way 

(6.6)  s0>0  =  0,  sK(  =  Y,  Z  »,j,  lSi:S2»,lS('£2»,  s2„>2„  -  0. 

i^k  j-^e 
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As  can  be  seen  very  easily 


(6.7) 


Kn  = 


1 

-max  skJ, 
n  (kj) 


1  | 

-  max  e 
n  (k,n 


Figure  1 

Array  of  SitJ  for  n  =  4. 

For  example,  in  Figure  1  we  have  n  =  4  and  as  can  be  justified  very  simply 
the  following  relations  hold 

(6.8)  7)4  4  =  i  <$4,6  =  i  s6,  4  —  1- 

There  are  altogether  (2 n) !  possible  arrangements  within  the  square  and  each 
of  them  allows  (2„")  possible  allocations  of  the  +1  and  the  —1,  that  is,  the 
number  of  all  possible  configurations  is  (2 n)  \  (2„")  =  [(2 n  )!/w!]2. 

Unfortunately  the  different  arrays  may  have  different  probabilities  depending 
on  F(x,  y). 

Consider  the  two  extreme  cases,  Y  =  X  and  Y  =  —X. 

If  Y  =  X,  then  (#lt  1 ,  #2, 2>  ‘  *  ‘  >  $2 n,  2n)  are  the  nonzero  terms  in  the  2 n  x  2 n 
square,  where  each  of  the  (2")  possible  arrays  of  the  + 1  and  the  —  1  is  of  equal 
probability.  In  this  case  the  relations  hold 
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that  is.  the  Gnedenko-Korolyuk  distributions  are  valid. 

In  the  second  case,  when  Y  =  —  X.  the  nonzero  terms  are  2„.$2i 2n- 1 
$2,1,1  )•  We  mention  without  proof  that  in  this  case  the  distribution  of  the 
Kuiper  statistics  determined  by  Maag  and  Stephens  [7]  is  valid.  In  the  two  sided 
case  we  have 


(6.11) 


P 


*  I 


2  n 


2n\|_  j=i  \n  —  jk 

n 


2n 


(A:  +  !)  _j(k  +  !)yj 


k  =  2.  3.  •  •  • ,  n. 


This  was  derived  for  finding  the  distribution  of  the  maximum  deviation  when 
two  samples  of  the  same  size  n  are  distributed  uniformly  on  the  circumference 
of  a  circle. 

These  two  cases  already  show  the  dependence  on  the  theoretical  distribution 
function.  Let  us  turn  now  to  the  independent  ease.  The  following  problem  is 
raised. 

The  distribution  of  the  maximum  deviation  and  of  the  absolute  maximum 
deviation  is  to  be  determined  when  the  two  random  variables  are  independent 


(6.12) 


<  “I F(x.y)  =  H1(x)H2(y) 
n 

Dn,n  <  -\F{x,y)  =  Hl(x)H1(y) 


—  ? 


=  ? 


In  this  case  each  array  has  the  same  probability,  as  given  above,  and  the 
determination  of  the  probabilities  can  be  reduced  to  the  enumeration  of  those 
paths,  that  is,  arrays,  for  which  the  maximum  is  k/n  or  max(i  j)*'ii j  —  k.  As  the 
above  example  shows,  the  Markovian  property  does  not  hold.  There  does  not 
exist  a  “first”  maximum,  instead  simultaneously  two  places  with  st j  —  k.  In 
this  way,  however,  a  reflection  can  be  made  and  a  path  with  {«§o.o  =  6, 
s2n,2n  =  6}  can  be  transformed  into  a  path  with  {-s0  0  =  0.  s2„j2n  =  2A:} ; 
however  this  is  not  a  one  to  one  mapping  of  the  paths. 

Concerning  the  reflection,  let  us  consider  the  pair  of  indices  for 

which  s^j*  =  k  =  max  stj  and  change  j  into  —  for  which  either  i  >  i* 
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or  j  >  j*  or  both.  Denote  by  a  and  /?  the  number  of  +  1  and  —  1,  respectively, 
in  the  set  {$ itj,  i  ^  i*,j  ^  j*}.  Then  a  —  =  k,  and  the  +1,  of  which  there 

are  n  —  a,  will  be  replaced  by  —  1,  and  the  —  1,  of  which  there  are  n  —  /?,  will 
be  replaced  by  +1.  In  this  way  the  number  of  + 1  will  amount  to  a  +  n  —  ft, 
while  the  number  of  —1  will  be  /?  +  n  —  a.  Consequently 

(6.13)  s2n<  2n  =  ot  +  n  —  (3  —  (/?  +  w  —  a)  =  2(a  —  fi)  =  2k. 

Now  the  number  of  paths  with  s0  0  =  0,  <s2„>2n  =  is  (2«)!  which 

must  be  smaller  than  the  number  of  arrays  with  D  +  „  ^  k/n.  Consequently  the 
relation  holds 

/  2  n 

(  .  k\  \n  ~ 

(6-14)  P[l)ln  <-Ul  -  - 

V  nJ  (2 n 

\n 

This  relation  follows  also  from  a  consideration  of  Nedoma  (personal  com¬ 
munication). 

For  w  =  2  the  number  of  possible  arrays  is  144.  the  distribution  of  D% j4  is 
contained  in  Table  II. 

TABLE  II 

Distribution  ok  DX.a, 


1 5 

1 

48 

U4 

3  “ 

144 

92 

5  _ 

120 

144 

6  “ 

144 

3  1  1 


In  the  case  of  n  =  3  the  number  of  possible  arrangements  is  14,400  for 
which  a  computer  is  needed. 
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